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ABSTRACT 

The Digital Optical Computing Program within the National Science Foundation 
Engineering Research Center for Optoelectronic Computing Systems has as its specific 
goal research on optical computing architectures suitable for use at the highest possible 
speeds. The program can be targeted toward exploiting the time domain because other 
programs in the Center are pursuing research on parallel optical systems, exploiting 
optical interconnection and optical devices and materials. Using a general purpose 
computing architecture as the focus, we are developing design techniques, tools and 
architectures for operation at the speed of light limit. Experimental work is being 
done with the somewhat low speed components currently available but with architec- 
tures which will scale up in speed as faster devices are developed. The design algo- 
rithms and tools developed for a general purpose, stored program computer are being 
applied to other systems such as optically controlled optical communications networks. 
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Center which is funded in part by the National Science Foundation under the Engineering Research Centers program grant No. 
CDR 8622236 and by the Colorado Advanced Technology Institute (CATI), an agency of the State of Colorado. 



INTRODUCTION 

The Digital Optical Computing Program within the Optoelectronic Computing Sys- 
tems Center at the University of Colorado at Boulder is centered around the design and 
construction of an "all-optical," general purpose, stored program, digital computer. It is 
"all-optical" in the sense that logic level components have only optical inputs and out- 
puts, with all inter-component signals restricted to light. It is digital because this type of 
operation has proven successful in representing both arbitrary precision numbers and 
control information. The computer science term, stored program, means that instructions 
are stored as data to be manipulated by the computer itself. Thus it can "write its own 
programs" by, for example, running compilers. Finally, "general purpose" implies an 
instruction set which supports both the symbolic processing needed to manipulate pro- 
grams as well as numeric computation. The design is bit serial to minimize the number 
of active optical devices. Fiber delay lines are used for storage because they are passive 
elements, suited for storing serial information. Waveguide switches using the electro- 
optic effect are used to do logic. The bit serial design uses bandwidth, or time domain 
capacity, to achieve processing power. Since terabits per second are possible in one opti- 
cal channel, much complexity can be put into the time domain, making possible proto- 
types with Few components. To minimize active elements, we have adopted a simple but 
complete instruction set without floating point arith metic. Instructions have one address 
with no complex addressing modes. A carefully optimized design gives a complete com- 
puter using only a few tens of switches. Optical fibers form all memory and intercon- 
necting components. There are no synchronizing elements such as flip flops, so all signal 
storage is in passive fiber delays. More important than demonstrating an optical com- 
puter is gaining more understanding of the use of the time domain in computer architec- 
ture and of time-space trade offs. Another goal is to transfer digital electronics 
knowledge to optics. There may be new ways to use optics which have no analogs in 
electronics, but it would be unwise to assume either a complete break with the extensive 
knowledge base in digital computing or to assume that it all transfers unchanged to 
optics. 

Prior work in optics which applies most directly to the current work is in communi- 
cations and signal processing. Single and multi-mode optical fiber and connector sys- 
tems have been developed and commercializedfl]. Static directional couplers are avail- 
able with specified power splitting ratios and can be used for fan-out or for combining 
noninterfering signals. Electrically switched directional couplers[2] have reached a rea- 
sonable degree of maturity, and are available from more than one source. They are used 
for modulation, multiplexing and demultiplexing^! of optical communications signals. 
To get a component with all inputs and outputs optical, we add an a photodetector, 
amplifier and electrode driver to allow the switch to be optically co ntr o lled . The above 
devices are used as shown in Fig. 1 to provide an implementation domain for digital opti- 
cal computing. It uses intensity encoding of bits and synchronous operation. A light 
pulse at a clock time is a logic one, and no light represents a zero. Hie waveguide switch 
computes the multiplexer function shown. Interconnection is done with single mode 
fiber and fanout by 3dB fiber couplers, which are also used to merge signals from two 
sources when only one source carries a signal at a time. Memory is accomplished 
entirely by the propagation time of optical pulses in fiber. The delay schematic shown in 
the figure represents a coil of fiber of delay K A. 
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Figure 1: Fiber and Waveguide Switch Implementation Domain. 


For our purposes, an electro-optically switched directional coupler can be viewed as 
a controlled exchange element with two optical waveguide inputs, the signals on which 
can be copied direcdy or exchanged onto two optical waveguide outputs. The direct, 
"bar," state or exchange, "cross," state are under the control of an electrical potential. Its 
physical properties can be summarized at the systems level by loss and crosstalk from 
inputs to outputs in both states. Loss can be kept under 5dB and crosstalk less than -20 
dB. Optical fiber has an index of refraction of about 1.5, which implies a distance-time 
correspondence of 20 cm/nanosecond. At a wavelength of 1300 nanometers, losses of a 
few tenths of a dBAm are achievable with low chromatic dispersion. Standard connector 
technology yields 1/2 dB or less loss per connection. See, for example, Cherin[4]. 

The photodetector and electronics encapsulated in the logic component limit its 
speed, so the impact of this limit on our work must be assessed. Our emphasis is on 
architecture, so the question is whether useful concepts and techniques can be developed 
in spite of the limitation. Logic is done entirely by the waveguide switches, and any 
other logically complete optical component can be used with little impact on the system. 
As far as interconnections and delays are concerned, the clock frequency can be 
increased simply by scaling down all fiber lengths. This is the essence of the concept of 
a speed scalable architecture. At the architectural level, all physical time constants other 
than the speed of light are encapsulated in the switching component. The speed of a sys- 
tem with a speed scalable architecture can be increased by replacing the logic element 
with one m times as fast and scaling down all fiber lengths by a factor of m . The archi- 
tecture remains unchanged. 
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To understand the potential value of speed scalable architectures, one can extrapo- 
late system speeds for devices which are still in the research stage. The time domain 
capacity of optical data transmission is important because transmission is becoming more 
of a limit to computer speed than switching. It is physically possible to produce and pro- 
pagate 10 femtosecond optical pulses, which translates to a bandwidth of 100 
terabits/second. Haner[5] has actually demonstrated 100 femtosecond resolution in a 
time compressed waveform, which promises that 10 terabits/second may actually be 
achievable. A fast, logically complete optical switch has been demonstrated by Islam[6] 
who built NOR gates using 300 femtosecond solitons. His gates show that optical 
switching and transmission may attain similar speeds. A smaller, but significant, speed 
improvement is expected from integrated electro-optic switches, waveguides, detectors 
and electronics in a III-V materials system[7]. 

Using such bandwidths requires a speed scalable architecture. The architectural 
drivers implied by speed scalability are: 

all inter-component signals are restricted to light; 

there are no synchronizing memory elements; 

synchronization is done by controlling optical delays; 

optical signal quality must be restored, both in amplitude and timing; 

any logically complete device can substitute for the switches. 

Although a general purpose digital computer focuses the work, it is not expected to be 
the first competitive success of optical architectures. A near term application is in opti- 
cally switched optical communication networks. Packet switching requires only a small 
amount of logic. Even simple optical state machines can improve high speed communi- 
cations systems, which now require conversion between optics and electronics for the 
simplest switching. High speed controllers in hostile environments, such as particle 
accelerators, are also a potential application. In general purpose computing, optics will 
complement electronics long before it r epla ces it. Time (or frequency) multiplexing can 
make high speed serial systems effective adjuncts to slower, parallel, electronic ones. 


BUILDING BLOCKS 

A digital computing architecture must include logic, interconnection, signal restora- 
tion and memory. The ability to restore signals in both amplitude and timing is not 
strictly a logical function, since it depends on the physical characteristics of signals. It 
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Figure 2: Signal Restoration in Amplitude and Timing. 
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can be accomplished with a switch component by gating the system clock as shown in 
Fig. 2. Amplitude is restored because the incoming optical pulse is physically switched 
to the output. For timing restoration to be effective, the control signal must arrive earlier 
than the clock and then be amplified and broadened in order to switch the full, correctly 
timed clock pulse to the output, while the second output receives a restored comple- 
mented copy of the control signal. Clock gating was used in electronic computers to 
restore timing. Here it also restores the optical power level. This makes supplying opti- 
cal power a problem of producing multiple copies of a synchronized clock. 

The multiplexer function, D = AC + BC , shown for the switch of Fig. 1, is logically 
complete given the constants zero and one. In the pulse coded representation, zero is the 
absence of light, and one is a copy of the system clock. A circuit with both logic and 
memory is the serial binary adder which will add two binary numbers presented to its 
inputs low order bit first. It consists of a single full adder and a one clock period delay to 
store the carry used in computing the next higher order sum bit. Figure 3 shows the cir- 
cuit built from waveguide switches, 3 dB couplers for fanout, and a fiber delay for the 
carry. The switches connected to the inputs not only complement them but also do signal 
restoration. Switches S3 and S4 demonstrate AND, OR and exclusive OR functions. 



Figure 3: Serial Binary Adder with Carry Delay 
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The circuit is independent of the length N of the binary numbers, but end conditions, 
such as initializing the carry delay to a zero or discarding the high order carry, require 
more switches and timing signals to mark word boundaries. 

Memory registers extend the carry delay of the binary adder with signal regenera- 
tion, read and write access to the register. Figure 4 shows a K bit register. Switch SI 
regenerates data on every circulation through the loop, or once every K clock periods. 
With switch S2 in the cross state, a one bit emerging from the K bit storage loop causes 
switch SI to copy a clock pulse into the loop. Zeros route the unconnected input B of SI 
to the D output. The 3 dB coupler allows the register to be read and switch S2 provides 
the ability to write new information by holding^ the Write input at logic one for K bit 
periods. When such a delay loop is used as a register in a serial machine, its length is 
usually equal to the computer word length, and its contents can be read or written once 
per word time. 

Multiple word memories use the same kind of storage loop shown in Fig. 4. With 
K bits per word, the length an N word memory loop is NK bits. A scale of N counter 
incremented once per word determines which word is currently passing switch S2, where 
it can be read or written. This counter requires m = Dog 2 A/] bits. If m < K, the m bit 
counter can be incremented during the passage of one word. To access a word, its 
address is compared to the counter value at each word time until a match occurs. A large 
memory can have several loops and an address of two parts, one of which selects a loop, 
while the other is compared to a counter. The number and size of loops is determined by 
the acceptable waiting time for an address match and by the physical limits on the loop 
capacity. Sarrazin[8] has examined the several physical limitations on memory loop 
capacity to establish a resolution of one part in 10 4 per degree C for a synchronous 
storage loop. 

The serial counter design will be referred to several times. Figure 5 is a block 
diagram of a four bit, scale of 16 serial cou nte r. On the left is a four bit increment signal 
consisting of a one in the low order bit position and three zeros. Below the half adder is 
a stored four bit count value, with low order bit at the left. Above the half adder is a 
carry bit. It and the count are stored in delay lines of one and four bit durations, respec- 
tively. Use of an m bit delay line and placing the increment signal at the low order bit of 
an m bit period are the only changes required to make it an m bit, scale of 2 m counter. 
Figure 6(a) shows a logic description of the counter and its optical design is shown in 



Figure 5: Block Diagram of a Bit Serial Counter. 
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Figure 6: Implementation of the Bit Serial Counter. 

Fig. 6(b). The signal labeled Clk is an optical oscillator with pulses appearing once 
every bit time. The signal Wck times a word of the binary count by producing a pulse 
every four bit times. Including two switches required to derive Wck from Clk, the com- 
plete design requires only five switches, making it a simple implementation target. 

The design of Fig. 6(b) neglects a fundamental problem of speed scalable design. 
Delays in the circuit are taken to be zero except for those associated with explicit delay 
elements. Delays are actually distributed throughout the circuit, in connections, switches 
and electrode drivers. The delay distribution problem is to distribute delays to coordinate 
signal arrival times at the inputs to a switch. It is a result of interconnection delay being 
of the same order as logic delay and the absence of flip flop synchronization. Lumped 
delay designs using familiar digital design techniques must be transformed into realistic 
ones with delays which meet the physical requirements of components and layout. Oth- 
erwise, signals to be logically combined will not arrive simultaneously at the proper logic 
element. 
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Figure 7 shows a two switch circuit to derive the word clock, Wck, for the counter 
of Fig. 6 from the master clock, Clk. Part (a) shows the lumped delay design, which pro- 
duces one pulse out for every four in, and part (b) shows the design with a delay associ- 
ated with each signal path. Also shown are two equations which ensure that correspond- 
ing inputs arrive simultaneously at switches SI and S2, respectively, and inequalities 
which characterize the minimum delays in the paths between outputs and inputs. The 
delays, 84 , 8g , 8c, 8p and 8g are associated with the five terminals of a waveguide 
switch, while 85 is associated with the fixed 3dB coupler used as a signal splitter. By 
adding length to the interconnections and adjusting the lumped delays, the equations and 
constraints can be satisfied, provided no feedback loop has an inherent physical delay 
longer than the specified lumped delay. Since the minimum lumped delay in a feedback 
loop for a non-trivial sequential circuit is usually one clock period, the "optical length" or 
latency of a switching element puts a lower limit on the clock cycle time. The latency is 
not necessarily related to the bandwidth of an element. The topic of time multiplexed 
architectures, which make use of high bandwidth logic in spite of long latency, will be 
discussed later. 


SYSTEM ARCHITECTURES 

At this point essential building blocks for a stored program computer — logic, 
registers and multi-word memory — have been discussed. The experimental question is 
whether a complete, stored program computer can be built with few enough switches to 
be feasible as a near term prototype. The current cost and size of available waveguide 
switches imply that a computer requiring hundreds to thousands of them would remain 
only a paper design yielding no practical experience with speed scalable architectures. 
The SCAMP[9] architecture is a carefully minimized design containing all of the features 
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Figure 7: The Delay Distribution Problem. 
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of a general purpose computer except input/output. I/O will initially be supplied by the 
monitor subsystem[10] necessary to control and make measurements on the computer. 
For minimality, general registers are represented by a single accumulator. It, the pro- 
gram counter, the instruction register and the memory counter are the only registers 
accessible every word time. 

Along with minimizing registers, the instruction set is also kept small. Multiply, 
divide and floating point arithmetic are left for software, although preliminary work on 
multiply! 11] and divide! 12] hardware is in progress. The arithmetic logic unit (ALU) is 
limited to and , or, not, add and shift. This has sufficed for many microprocessors and is 
a reasonable first step. The design proceeded in two phases. First, logic, registers and 
memory were assembled under the assumption that the waveguide switches implemented 
perfect multiplexers. The complete design required only about 50 waveguide switches. 
The second phase used measured loss and crosstalk values to determine where to place 
signal restorers to meet the physical specifications of the switches and photodetectors. 
This second phase resulted in a design using about 75 switches. Although the design 
uses a 16 bit word length, only delay line lengths change to accommodate any word 
length which is no shorter than the memory address length plus six bits. Simulations 
verified the design at both the logic level and the physical level. 

The soliton gates[6] cited as a demonstration of very high speed logic used 20 
meters of fiber to obtain sufficient interaction length given the weak nonlinearity of glass. 
Such extreme ratios of reciprocal latency to bandwidth are not expected in mature optical 
devices, but interaction lengths of a few centimeters in terahertz bandwidth gates would 
not be surprising because of the large power densities which would otherwise be 
required. Although long latency limits the minimum feedback loop length, such gates 
would have the potential for hundred-fold time multiplexing or pipelining. Decoupling 
the duration of a switched pulse from the latency of the switching element by this means 
opens up important possibilities for optical logic devices. 

An architectural technique to make use of devices with high bandwidth but long 
latency decouples consecutive bits by time multiplexing serial data streams. One can 
multiplex several bit streams on the hardware of the SCAMP to yield several indepen- 
dent computers. Such a time multiplexed multiprocessor requires multiplexing of pro- 
cessor inputs and demultiplexing of outputs, as shown in Fig. 8. Since multiplexing and 
demultiplexing do not require feedback, they can be implemented with long latency 
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Figure 8: Time Multiplexed Multiprocessor 
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devices. Time multiplexed multiprocessors have been built with electronics. An early 
commercial one implemented the ten peripheral processors of the CDC6600[13], and a 
more recent pipelined multiprocessor, the Denelcor HEP[14], multiplexed up to 128 
instruction streams on one set of processor hardware. Pipelined vector units in current 
supercomputers use time multiplexing of independent vector components to achieve high 
speed. Latency tolerance is incorporated at the level of numeric operations in arithmetic 
pipelines and systolic arrays, but only in the highest speed designs is it a gate level con- 
cern of the sort addressed by the delay distribution problem. More research is needed on 
trade-offs and optimizations possible in designing systems pipelined at the gate level, 
especially if such designs use no latches. The elimination of latches is not intrinsically 
desirable, but since latching implies a device entering a stable state, and since time con- 
stants associated with stable states are long compared to those of unstable states, the 
highest speed designs may well avoid latches. 

The immediate future of high speed optical logic is probably in communications 
rather than general purpose computing. Packet switched communication networks, con- 
trolled by information contained in the data being transmitted, have great utility. Since 
information in high speed fiber networks arrives and departs in optical form, optically 
controlled optical switching would benefit such networks. Since network control logic 
can be simple, and the network is extended in space, existing expensive and large 
waveguide switches are not as severe a limitation as in general purpose computing. A 
specific architecture for a high speed, packet switched, optical co mmun ications network 
is being studied! 15]. It is based on three interacting ideas: 1) a network of NlogN nodes 
of fixed fan-in and fan-out in which a m essage needs to pass through only order logN 
intermediate nodes to reach its destination; 2) "hot potato" routing, in which messages 
are not stored in intermediate nodes; and 3) optical compression of data packets to 
release bandwidth for use in network synchronization and control. Nodes in the network 
not only do switching but are associated with hosts which originate and consume mes- 
sages. A ShuffleNet[16] network of N \ 0 g 2 N nodes with two inputs and two outputs per 
node and a maximum distance of 21og2N — 1 between any two nodes will be used. When 
two incoming messages need to use the same output port, electronic switching nodes typ- 
ically store one of the conflicting messages for later transmission. Rather than do 
electro-optic conversion for storage, the network uses "hot potato" or deflection routing 
to send one of the messages through the wrong output port to make its way by another 
path to its destination. Finally, conflicts are minimized and synchronization is simplified 
if message packets are separated in time by large gaps. Time compression of the optical 
data by wavelength multiplexing or grating techniques can create such gaps, thus trading 
potential da ta bandwidth for ease of net work control. 

Another important architecture uses fiber delays and exchange switches for time slot 
interchange. This time domain permutation is useful Both in accessing information from 
a serial delay lin e in an o rder different from that in which it is stored and to allow the 
time multiplexed independent processors of Fig. 8 to exchange information. Time is 
divided into slots containing information, with a frame consisting of N sequential time 
slots. Time slot interchange means moving information from slots of an input signal into 
slots in different relative positions of the output frame, such a permutation is associated 
with a frame delay. A time slot interchange architecture has immediate application in 
time multiplexed telecommunications channels. Time multiplexed signals are most often 


10 


switched by demultiplexing into separated channels, switching in the space domain, and 
re-multiplexing the result. Thompson’ s[ 17] architecture uses waveguide switches to 
demultiplex an input stream into individual time slots, uses fiber loops to individually 
delay them, and uses more switches to multiplex them into the output stream. Leaving 
out switches needed to vary the delays, 2N - 2 switches are used in the multiplexer and 
demultiplexer. 

Ramanan[18] applied techniques developed for multistage switching networks in 
the space domain to time domain permutation. The basic building block of the architec- 
ture uses a switch connected to a delay loop of size A in a feedback configuration to 
selectively interchange pairs of time slots separated by a fixed time A, a multiple of the 
slot time. Figure 9 shows the situation for a A of one slot time. Any number of pairs can 
be interchanged by setting the control for exchange (x) for all time slots except the 
second of a pair to be exchanged, for which it is set for straight connection (=). The 
Benes[19] network, with IN \ 0 g 2 N - 1 exchange switches, is a universal space domain 
switch. Ramanan’s time domain analog of this network can perform any time slot per- 
mutation on a frame of N = 2* slots with only 21og2^V - 1 of the above building blocks. 

One block with delay loop of length N/2 can selectively exchange any pair of slots 
separated by N/2 units. The frame suffers an overall delay of N/2 slot times. If we now 
use an N/2 exchange switch at both input and output of a universal interchanger for 
frames of length N/2, as shown in Fig. 10, we have a recursive construction for a univer- 
sal interchanger of length N . The input stage allows time slots to be selectively 
exchanged between first and last half frames, the center section permutes each half frame 
arbitrarily, and the output stage again allows selective exchange of pairs between half 
frames. This is sufficient to apply the Benes looping algorithm[20] to show that if the 
center can permute frames of N/2 slots, the whole network can permute frames of length 
N. If N = 2* is a power of two, continuing the recursion until a one block exchanger for 
adjacent slots is left in the center yields a general time slot interchanger with 21 og 2 N - 1 
switches and delay loops, as shown in Fig. 11. An alternative design in which the delays 
increase toward the center as powers of two is also possible but more difficult to 
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Figure 9: Exchange of Time Slot Pairs 
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Figure 10: Recursive Construction of a General Time Slot Interchanger. 
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Figure 11: A Time Slot Interchanger with 21og2N - 1 Switch*??. 

describe. Thompson’s design requires 2 N — 2 switches for the demultiplexer and multi- 
plexer alone. For permuting 1024 time slots, the new design requires 19 switches com- 
pared with more than 2046 for the other architecture. This new architecture shows how 
optics can give insight into time-space tradeoffs which may even have advantages for 
electronic implementation. Since time slot interchange forms a large fraction of all 
telecommunications switching, the practical value of the result may be large. 

TOOLS AND TECHNIQUES 

Simulation is an important tool in realizing computer architectures, which by nature 
involve high complexity. The SCAMP design uses many clever tricks to reduce the 
number of switches. Since clever tricks can backfire, a logic level verification of the 
design is the first important step. The tool built to do this is an event driven simulator 
called HATCH[21]. As a result of the absence of flip flops in the design, it is a continu- 
ous time simulator. Clocked timing is introduced, as in the actual system, by an object 
called a clock which produces a standard repetitive signal. The HATCH software is 
object oriented so that it can evolve to meet new simulation needs by the addition of new 
object types and methods. 

The first evolutionary challenge met by HATCH was to solve the delay distribution 
problem, described in connection with Fig. 7. The circuit is taken as a graph whose 
nodes are waveguide switches or 3 dB couplers. The edges of the graph represent inter- 
connections between elements. A delay vector, with one component for each edge, 
characterizes the delays in the design. For the lumped delay design, many of these com- 
ponents are zero. In a real design, each delay vector component is always greater than 
some minimum which represents the path length through components, length of couplers, 
length of the interconnecting fiber, and for some edges, the latency of photodetector and 
electrode driver circuitry. The physical constraints are thus embodied in a minimum 
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delay vector over the edges of the graph. The linear equations ensuring synchronized 
signal arrival are derived from the lumped delay design. A delay vector having each 
component greater than or equal to that of the minimum vector and satisfying the linear 
system is a possible design, and that having the least extra delay is the solution. Three 
algorithms for solving this constrained minimization problem were studied and com- 
pared: the simplex method[22], the shortest path method[23], and the local distribution 
method[24]. The study[24] showed that the local distribution method converged well, 
with delays increasing monotonically up from the lumped delay values to those of the 
solution vector. The simplicity of this algorithm gave it better performance than the 
other two, so it was therefore included in HATCH to do delay distribution. 

Signal quality management is also included in HATCH. Power losses can make 
ones appear to be zeros while crosstalk in the switches may cause zeros to accumulate 
noise and appear to be ones. At each switch control terminal, a threshold decision distin- 
guishes zeros from ones. A signal restorer must be placed in any optical path from a 
standard clock which has enough loss for a logical one to be below threshold or enough 
crosstalk for a zero to be above threshold. If loss and crosstalk specifications are associ- 
ated with each device, HATCH[25] can compute signal degradation associated with a 
specified path or identify the worst case path. A designer can thus use it to add restoring 
switches to a design which assumes ideal elements. This was done for the SCAMP 
design assuming a loss of -5 dB, a crosstalk of less than -20 dB and a control terminal 
photodetector threshold of -19 dBm, obtaining a signal restored design for SCAMP 
requiring only 75 switches. 

The extended HATCH is a general tool to design fiber optic and waveguide switch 
based systems. Starting with a lumped delay design, logic simulation with ideal gates 
verifies the sequential behavior. Component delays then allow HATCH produce fiber 
lengths for a distributed delay design. When loss and crosstalk specifications are added, 
HATCH identifies critical paths for insertion of signal restoring switches. The final 
design is then simulated with delay, loss and crosstalk specifications to produce loga- 
rithmically scaled plots of signal amplitudes versus time under worst case loss and 
crosstalk assumptions. An overview of the functionality of HATCH is illustrated in Fig. 
12 . 

It has been mentioned that techniques for gate level time multiplexing can help 
overcome the effects of latency. A specific example is the serial counter design. The 
shortest feedback loop in the counter of Fig. 6 has a length of one bit time. Since it 
passes through two switches and a 3 dB coupler, it sets a lower limit on the bit rate of the 
counter. Time multiplexing can increase the effective counter bandwidth by multiplex- 
ing more than one independent bit stream on one set of counter hardware. This gives the 
effect of several simultaneous counters, each running at the original bit rate. A block 
diagram for this scheme with two multiplexed counts appears in Fig. 13. The counter 
associated with the bits in the white boxes is about to be incremented from 3 to 4 while 
that associated with the stippled boxes is about to change from 8 to 9. A carry feedback 
generated from a bit at time t need not combine with a bit arriving at the increment input 
any sooner than two bit times later. The two Wck input streams can be multiplexed 
using only differential delay and a 3 dB coupler, and the count outputs can be demulti- 
plexed by one switch toggling at the effective bit rate. 
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Figure 12: The HATCH Design Support System. 



Figure 13: Two Independent Counters Multiplexed on the Same Hardware 

EXPERIMENTS 

The demonstration of a prototype optical computer involves several intermediate 
experiments. From the architecture viewpoint, a simple feedback state machine is the 
first step. We chose the one out of four scaler of Fig. 7 driving the counter of Fig. 6. The 
count value delay line demonstrates one word storage, so the second step is the multi- 
word memory loop, which also requires a binary counter and a serial comparator. The 
memory and one word registers will hold operands and result for an arithmetic unit, 
which will be the third subunit built. The instruction fetch, decode and execute cycle 
will be implemented last. The current status of experiments is between the counter and 
memory demonstrations. 

The scaler of Fig. 7 is a feedback state machine, but is simpler than the counter 
because it is self stabilizing if a bit is lost or gained. An infrequent bit error in the scaler 
would go undetected on an oscilloscope. The counter, on the other hand, has a period of 
64 bits, and single bit errors have a large influence on its output. The optical scaler and 
binary counter combination has been built and tested[26] yielding the output waveform 
for a 50 MHz clock rate shown in Fig. 14. The complemented count available at the 
unused output of SW5 in Fig. 6 is shown, low order bit first reading left to right. Chang- 
ing two fiber lengths yields a six bit, scale of 64, counter, and this device was also built 
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Figure 14: Output of the Four Bit Optical Counter 

and operated at a 50 MHz clock rate. A modified design with a shortened carry feedback 
loop was built and operated at 100 MHz. The technique of using time multiplexing to 
increase the effective counter bandwidth was also applied to obtain 100 MHz operation 
by interleaving two independent count values, independently incremented by interleaved 
Wck signals. The dual counter was also operated successfully at an effective 100 MHz 
rate. 


CONCLUSIONS 

The work discussed here primarily exploits the time domain to make potential use 
of high optical bandwidth, although the packet switched communications network also 
includes significant spatial parallelism. If this work does not directly address the use of 
spatial parallelism in optics, it also does not conflict with it. The ideas of speed scalable 
architectures should ideally be combined with the parallel optical designs being pursued 
effectively by other groups [27] [28] using synchronous operation and latching gates. 
The most parallel system running at the highest possible speed is the ideal optical com- 
puter, although the time slot interchanger shows that at least some interesting systems are 
strictly serial. 

This work also does not directly address the problem of producing or using an ideal 
optical device, which is fast, small, can be highly integrated, and uses little power. These 
architectures would smoothly scale up in speed with the availability of such a device, but 
our work has no device development component as such. The size, speed and cost of the 
LiNbO 3 waveguide switches is such that the specific implementation of the prototype 
architecture discussed here would not be competitive as a general purpose computer, 
although it could have special purpose application as a very high speed controller in sys- 
tems where data is optical to begin with. 

Optical computing helps in understanding the architectural problems associated 
with very high speed digital computing. Electromagnetic radiation and induction effects 
are avoided, and experimental demonstrations of both communications and switching at 
terabit per second bandwidths exist. Current digital architectures are heavily influenced 
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by the assumptions of arbitrary fanout and instantaneous signal propagation within 
moderately complex subsystems. As switching speeds become faster and power more of 
a concern, both assumptions prevent architectures from scaling up in speed. This work 
involves latch-free designs in which finite signal propagation time is fundamental. Such 
speed scalable designs can take advantage of higher speed devices as they become avail- 
able. Tools such as the delay distribution algorithms are essential to this style of design. 
Optics provides an excellent environment in which to study speed of light limited archi- 
tectures, which are becoming of increasing concern in electronic computer design also. 

The systems described here are not general purpose supercomputers. The results 
show that designing an optical computer involves much more than simply inserting an 
"optical transistor" into an existing design. The maturity and commercial development 
of digital electronics suggests that an all-optical computer is not imminent. Optics will 
probably find its way gradually into digital computers, starting from the fibers already 
used to connect cabinets in large, high speed systems. Although optical architectures 
may well be different from electronic ones in important respects, they will probably build 
on the digital design knowledge base on which electronic computers rest. Optical com- 
puters will eventually combine spatial parallelism with high speed design constrained by 
the speed of light limit, as will future electronic c ompute rs. In the meantime, a better 
understanding of speed of light limited digital systems shows great promise for immedi- 
ate applications. Communications systems can benefit from even limited optical process- 
ing. Time critical tasks in signal processing are another area in which significant applica- 
tions may exist. Perhaps even more important is the fact that the speed of light limit is a 
universal phenomenon, not just an optical one. By studying the time-space tradeoffs in 
the optical domain, insight may be gained into the fundamental nature of physical reali- 
zations of the mathematical model which constitutes computation. 
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